Skip to content

PS-11001: Implement logic for purge_binlogs mode (local filesystem storage only)#133

Open
kamil-holubicki wants to merge 2 commits into
Percona-Lab:mainfrom
kamil-holubicki:PS-11001-part-2
Open

PS-11001: Implement logic for purge_binlogs mode (local filesystem storage only)#133
kamil-holubicki wants to merge 2 commits into
Percona-Lab:mainfrom
kamil-holubicki:PS-11001-part-2

Conversation

@kamil-holubicki
Copy link
Copy Markdown
Collaborator

Add 'binlog_server purge_binlogs <binlog_name>' which removes
every binlog file with sequence number less than or equal to <binlog_name>
(a contiguous prefix). The current tail is refused: emptying the storage
would lose the resume position and force the next 'fetch' / 'pull' to
re-stream from the beginning of the source's retained binlog history.

Phase 1 only: 'file' backend; v2 will extend to 's3' once 'remove_object'
is wired to 'DeleteObject'.

Self-healing recovery from a purge crashed
between the index rewrite and the payload/metadata deletion is left for
later phases - existing constructor validators refuse to open such
a storage and surface it for manual cleanup.

https://perconadev.atlassian.net/browse/PS-11001

Added a new 'binlog_server list <json_config_file>' subcommand that
enumerates every binlog file currently present in the configured
storage and prints them, in chronological (storage) order, as a JSON
document on stdout. The per-record schema matches the existing
'search_by_timestamp' / 'search_by_gtid_set' responses (name, size,
uri, previous_gtids, added_gtids, min_timestamp, max_timestamp) and
the 'binsrv::models::search_response' class is reused as-is.
Unlike the search subcommands, 'list' returns 'status:success' with
an empty 'result' array on an empty storage instead of raising an
error. This makes 'list' the primary diagnostic command operators
can use to distinguish a freshly initialized (empty) storage from a
corrupted one.

Co-authored-by: Cursor <cursoragent@cursor.com>
#include <string_view>
#include <system_error>

#include <fcntl.h>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't like polluting standard-conformant c++ code with posix. I also understand that in our project we will definitely need fsync() for files and / or directories. What I suggest is to create a separate header, say util/native_file_operations_helpers.hpp with c++-conformant interface (no posix includes). At the same time put the implementation with all the posix-spacific calls to the util/native_file_operations_helpers_posix.cpp. Who knows, may be one day we will have util/native_file_operations_helpers_windows.cpp.
In this case we will include either util/native_file_operations_helpers_posix.cpp or util/native_file_operations_helpers_windows.cpp to the project conditionally using CMake means.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

--exec $BINSRV purge_binlogs $MYSQL_TMP_DIR/no_such_config.json $first_binlog > $read_from_file
--source include/read_file_to_var.inc
--assert(`SELECT JSON_EXTRACT('$result', '$.status') = 'error'`)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably add "trying to execute binlog_purge with currently active file specified in the command line". If I haven't missed this among other cases.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see case 9 below

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew I missed it :)

Comment thread src/binsrv/basic_storage_backend.hpp Outdated
// remove several objects back-to-back from the same backend and
// want to amortise the fsync cost over the whole batch by passing
// 'do_fsync=true' on the final call only.
void remove_object(std::string_view name, bool do_fsync = true);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the idea of exposing do_fsync() to public interface. I'd rather have
2 private methods

virtual void do_remove_object(std::string_view name) = 0;
virtual void do_fsync() = 0;

and 2 public methods:

void remove_object(std::string_view name) {
   do_remove_object(name);
   do_fsync();
}

void remove_objects(std::collection<std::string_view> names) {
    for(const auto &name: names) {
     do_remove_object();
   }
   do_fsync();
}

What do you think?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way I proposed is more flexible. But if we agree that do_fsync is must have after remove_object or remove_objects, we can go your way. To be honest I think both approaches are more or less equal.
If you insist, I will change.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would insist, flushing is not something users of the class should care about. For them, it should just work.


void filesystem_storage_backend::do_put_object(std::string_view name,
util::const_byte_span content) {
// atomic-overwrite is implemented via the standard POSIX
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one concern here. The suggested scehema with single rename is totally OK for POSIX, renaming a file to an existing one is a well-defined atomic operation. No questions here.

However, as you may have already noticed our S3 uploads are not optimal. We always completely owerwrite full existing object. However, there is an area for improvemnt here. We can initiate a multipart upload and compose a new object from a reference to an existing one and a new data block.
In other words, if we have

/path/object1.data (1024 bytes)

we can create

/path/object2.data (2048 bytes total : 1024 from /path/object1.data and 1024 bytes from new data block upload)

However, S3 does NOT have a rename operation at all. So, you won't be able to just change the name of
/path/object2.data back to /path/object1.data.
You can only

  1. delete /path/object1.data`
  2. copy /path/object2.datato/path/object1.data
  3. delete /path/object2.data`

So, my question here is what atomic rename scheme we should use so that we would have a single recovery algorithm for both file and s3 storage backends.

Still not 100% sure here but one of the options could be (for file storage backend)

  1. create .tmp1 with new content
  2. move original to .tmp2
  3. rename .tmp1 to original
  4. remove .tmp2

Copy link
Copy Markdown
Collaborator Author

@kamil-holubicki kamil-holubicki May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per documentation S3 PutObject is atomic. So our implementation of s3_storage_backend::do_put_object is atomic. No changes needed here.
In other words: create - rename pattern is only for posix. For S3, PutObject is already atomic.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Kamil. Currently, PutObject is atomic and no problems with it. Here I am going a bit ahead of time when we decide to simulate this append operation with MultipartUploads. But again, this is definitely another task.

@kamil-holubicki kamil-holubicki force-pushed the PS-11001-part-2 branch 2 times, most recently from 6564e25 to f227868 Compare May 27, 2026 09:43
Add 'binlog_server purge_binlogs <config> <binlog_name>' which removes
every binlog file with sequence number less than or equal to <binlog_name>
(a contiguous prefix). The current tail is refused: emptying the storage
would lose the resume position and force the next 'fetch' / 'pull' to
re-stream from the beginning of the source's retained binlog history.

Phase 1 only: 'file' backend; v2 will extend to 's3' once 'remove_object'
is wired to 'DeleteObject'.

Self-healing recovery from a purge crashed
between the index rewrite and the payload/metadata deletion is left for
later phases - existing constructor validators refuse to open such
a storage and surface it for manual cleanup.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants